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Abstract 

Importance sampling has been known as a powerful tool to reduce the variance of 
Monte Carlo estimator for rare event simulation. Based on the criterion of minimizing 
the variance of Monte Carlo estimator within a parametric family, we propose a general 
account for finding the optimal tilting measure. To this end, when the moment generating 
function of the underlying distribution exists, we obtain a simple and explicit expression 
of the optimal alternative distribution. The proposed algorithm is quite general to cover 
many interesting examples, such as normal distribution, noncentral distribution, and 
compound Poisson processes. To illustrate the broad applicability of our method, we study 
value-at-risk (VaR) computation in financial risk management and bootstrap confidence 
regions in statistical inferences. 
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1 Introduction 



This paper considers the problem of estimating small probabilities by Monte Carlo simulations. 
That is, we estimate z = P{A) when z is small, say of the order 10~^ or 10~^ or so; i.e., A is a 
moderate deviation rare event. Such problems appear in the construction of confidence regions 
for asymptotically normal statistics; cf. Beran (1987), Beran and Millar (1986), Hall (1987, 
1992), Fuh and Hu (2004), and in the computation of value-at-risk (VaR) in risk management; 
cf. Jorion (2001), Duffie and Singleton (2003), Glasserman et al. (2000, 2002), Fuh et al. (2011). 
It is well known that importance sampling, where one uses observations from an alternative 
distribution Q to estimate the target distribution P, is a powerful tool in efficient simulation of 
events with small probabilities. Some general references are in Heidelberger (1995), Liu (2001), 
and Asmussen and Glynn (2007). 

A useful tool in importance sampling for rare event simulation is exponential tilting, cf. 
Siegmund (1976), and Bucklew (2004) and references therein. The above mentioned algorithm 
is more efficient for large deviation rare event, i.e., z is of the order 10~^ or less. Examples of 
such events occur in telecommunications {z = bit-loss rate, probability of buffer overflow) and 
reliability {z = the probability of failure before time t). To be more precise, when a sequence 
of random vectors converge to a constant vector fi, for any event A not containing /i, 

the probability P{X„ G A} usually decays exponentially fast as n — >■ oo. Efficient Monte Carlo 
simulation of such events has been obtained by Sadowsky and Bucklew (1990) based on the large 
deviations theory given by Ney (1983). 

For rn o derat e deviation rare e v ent simulations, effic ient importance sampling has been studied 
bv I.TohnsI fll988h . IPavisonl f ll988h . IDo and Halil f ll99lh . and Fuh and Hu (2004, 2007). However, 
those papers concern one- and/or multivariate-normal distributions. Extension to heavy-tailed 
settings such as multivariate t distribution can be found in Fuh et al. (2011). The goal of this 
paper is to provide a simple general account for exponential tilting importance sampling, which 
covers all previous results and many other interesting examples. 

It is worth mentioning that for events of large deviations P{X G A}, Sadowsky and Buck- 
lew (1990) showed that the asymptotically optimal alternative distribution is obtained through 
exponential tilting; that is, Q{dx) = C exp{6x)P{dx), where C is a normalizing constant and 6 
determines the amount of tilting. The optimal amount of tilting 6 is such that the expectation 
of X under Q-measure equals the dominating point located at the boundary of A. However, for 
moderate deviation rare event, we show that typically the tilting point of the optimal alternative 
distribution is in the interior of A, which is different from the dominating point of large deviations 
theory. Furthermore, by using the idea of conjugate measure of Q, Q{dx) = C exp{—9x)P{dx), 
the general account of our approach characterizes the optimal tilting 6, by solving the equa- 
tion of the expectation of X under Q-measure equals the conditional expectation of X under 
Q-measure given the rare event. 

There are three aspects in this study. To begin with, we obtain an explicit expression for 
the optimal alternative distribution under exponential embedding family. Second, the proposed 
algorithm is quite general to cover many interesting examples, such as normal distribution, non- 
central distribution, and compound Poisson processes. Third, the derived tilting formula can 
be used to calculate portfolio VaR under jump diffusion models, and to approximate bootstrap 
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confidence regions of parameters in regression models. 

Tlie rest of this paper is organized as follows. In Section 2, we present a general account 
of importance sampling that minimizes the variance of the Monte Carlo estimator within a 
parametric family, provide a recursive algorithm for finding the optimal alternative distribution, 
and approximate optimal tilting probability measure for moderate deviation events. Section 
3 presents several frequently used examples to which we can characterize the optimal titling 
probability measures, and reports the relative efficiency of the proposed method with respect to 
the naive Monte Carlo through a simulation study. In Section 4, we demonstrate the performance 
of the tilting formula by investigating two examples: calculating portfolio VaR and bootstrapping 
confidence regions. Concluding remarks are given in Section 5. The proofs are deferred to the 
appendix. 



2 Importance Sampling 

2.1 A general account in importance sampling 

Let {Q, J-", P) be a given probability space, X be a random variable on Q and A be a measurable 
set in R. To estimate the probability of an event {X G A}, we shall employ the importance 
sampling method. That is, instead of sampling from the target distribution P of X directly, we 
sample from an alternative distribution Q := Qg. Suppose X has moment generating function 
"^{0) = E[e^^] under P for 6* G R. Then we consider the exponential tilting measure Q of P, 
which has the form 



dQ 
dP 



Em 



where ip{6) is log \E'(^), the cumulant generating function of X. The question is how to choose an 
alternative distribution Q so that the importance sampling estimator has the minimum variance. 
The importance sampling estimator for p = P{X G A} based on a sample of size n is 



Pv 



dP 

i=l 



where Is is the indicator function of an event -B, Xi 



(2.1) 



n, are independent observations 



from Q, and dP/dQ is the Radon-Nikodym derivative assuming P is absolutely continuous with 
respect to Q. Set 



Gie) = Eq 



dP' 


2 


dP' 




= E 





E[l 



{xeA}e" 



-ex+ip{e)i 



where the expectation without any qualification is under the target probability measure P un- 
less otherwise stated. Observing that, since the estimator pn is unbiased, the variance of the 
importance sampling estimator is 



vaTQ^pn) = n ^{G{d) 



P 
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We remark that when A = {u : X{uj) > a} for some constant a > 0, large deviation theory 
considers the following inequality 

G{e) < e-^'^+-^'(^), 

and minimizes the above upper bound. The first-order condition (after taking logarithm) gives 

i/j'{e) = a. (2.2) 

Note that the approximation f l2.2p is more accurate when P{X > a} is small, i.e., a is sufficient 
large. In contrast to the large deviation theory, our goal is to solve the optimization problem 

for e 



r = argminG(^). (2.3) 

6 

9* is hence the desired quantity for selecting the tilting measure that minimizes the variance of 
the importance sampling estimator within a suitable parametric family. 
To minimize G{9), the first-order condition gives 

^ = Em^,^^e-'^^^^'\-X + i;'{e))] = 0. 

Dividing e'^*^^-' for both sides of the above equation, we have an equivalent condition, 

E[l{X6A}e-^^(-X + ^'(^))]=0, (2.4) 
and therefore 6* is the solution of 

no) = (2.5) 

Ideally, closed-formed formulas of the right-hand-side (RHS) of fl2.5p needs to be derived via 
analytic procedures. Then, standard numerical procedures (or a recursive algorithm presented 
in Section 2.2) can be applied to find 6* satisfying (12. 5p . However, the derivation for a closed- 
formed formula of the RHS of (12. 5p may be tedious and complicated in general cases. In this 
respect, the tilting measure given in (12. 2p in the large deviation theory seems to be preferred 
because of its simplicity. To prove that the RHS of (12.51) can be represented in a simple formula, 
we consider the conjugate measure Q := Qg of the measure Q, which is defined as 

dQ _ e _ gx-^{9) r,-. 
dP ~ E[e-^^] ~ ^ ' ^ 

where ^{9) is log'^(^) with ^>{e) = E[e~^^]. 

To present a connection between Q and Q, we consider their probability densities with respect 
to Lebesgue measure C It is straightforward to see that ip{6) = ip{—6), which implies 

(^Qo ^ -6»x-0(6»)^ ^ -ex-ip{-e)^ ^ (-e)x-i){-e)'^ ^ dQ^e 

dC dC dC dC dC ' ^ ' ' 
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The following theorem states the existence and uniqueness for the optimization procedure 
(12. 3p . and provides a simplification on the RHS of fl2.5p using Q. Before that, we need a condition 
for '^{9) being steep to ensure the finiteness of the moment generating function ^{0). To define 
steepness, let 6'max := sup{^ : ^{9) < oo} (for light-tailed distributions, we have < ^max < C)o). 
Then steepness means \E'(6') — oo as ^ — )■ 6'max- 

Theorem 1 Suppose the moment generating function '^{9) of X exists and second order con- 
tinuously differentiable for ^ G I C R. Furthermore, assume that "^{9) is steep and E[X\X G 
A] > E{X) := jji. Then there exists a unique solution for the optimization problem (12. 3p . which 
satisfies 

ij'{9) = EQ^[X\XeA]. (2.8) 

The proof of Theorem [T] will be given in the Appendix. 
In summary, the three-step procedures to find 9* are 

1. Calculate the cumulant generating function il){9) of X and its derivative ip'{9). 

2. Find the exponential tilting measure Q. 

3. Find 9* as the solution of tp'{9) = Eq^[X\X G A]. 

Remark 1: Note that the optimal tilting point 9* obtained in (12. 8 p highlights the fact that the 
tilting probability measure depends on the likelihood ratio (or the embedding probability Q in 
terms of Q), the region, and the statistics of interest. Simple cases such as normal distribution 
and t-distribution are in Fuh and Hu (2004), and Fuh et al. (2011), respectively, by using the 
technique of change of variables. The idea of using conjugate measure in Theorem [1] seems to 
be new and simple according to our best knowledge. 

Remark 2: In addition, characterization (12. 8p is easy to implement as will be illustrated in 
Sections 3 and 4. Furthermore, it provides an insightful interpretation of 9*, which can be used 
to compare with the large deviation tilting. To be more specific, when A = {w : X{w) > a} for 
a > fj,, it is known that 9* satisfies 4''{9) = Eq^X. (12.81) indicates that the optimal tilting 9* 
satisfies 

EQ^X = EQ^[X\X>a]>a. (2.9) 

A comparison between (12. 9p with (12. 2p shows that the large deviation tilting parameter is the 
dominating point a; while the optimal tilting 9* is inside the region of {w : X{w) > a}. Similar 
interpretation can be applied to the case of A is a convex set in R'^. However, when A = {w : 
X{w) G {—a,ay}, where c denotes the complement, it is nature to spit A = {w : X{w) > 
a}U {w : X{w) < —a} and apply importance sampling for each part. A simple guidance for the 
stratification, based on relative efficiency, can be found in Fuh and Hu (2004). 
Remark 3: To approximate z = EZ where Z > P-a.s.. Define P*{dw) = -^^^P{dw) and 

L* = It is known (cf. Theorem 1.2 in Chapter V, Asmussen and Glynn, 2007) that 

the importance sampling estimator ZL* under P* has zero variance. By using similar idea, 
tilting probability obtained by (12. 8 p can be regarded as an approximation of the zero variance 
importance sampling estimator within a parametric class. 
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2.2 Calculating the optimal alternative distribution 



Before employing the importance sampling method, it is first necessary to identify the optimal 
alternative distribution. Since the optimal 6 in f l2.8p cannot be computed directly, to find 6 in 
the third step, we consider a simple recursive procedure for a general equation, 

g{0) = hie), (2.10) 

for some functions g{9) and h{9). In our setting, g{9) = ip'{9) and h{6) = Eq[X\X G A]. 
A simple recursive procedure is implemented as follows. 

1. Start with an arbitrary 6^. Set i = 1. 

2. Calculate f = gi9'~^) and find 9' as the solution of h{9) = t\ 

3. Set i = i + 1. Return to 2 until {9^^^ — 9^)/9'' becomes very small. 

Because the exact objective function G(-) is difficult to optimize directly, the proposed 
method replaces G with a quasi objective function and does optimization on the latter. In 
the preceding algorithm, the initial value 9^^^ can be chosen as a dominating point of the event 
{X G A}. Furthermore, if 6*^°^ is sufficiently large, the density of X decreases rapidly and the 
solution of fl2.5p is close to 9^^\ Therefore, fast convergence of the recursive algorithm is to be 
expected. 

Let 9* be the solution to fl2.10p . i.e., g{9*) = h{9*). By using an argument similar to Theorem 
2 of Fuh et al. (2011), we have the following proposition. 

Proposition 1 Choose a dominating point of the event {X G A\ as the initial value 9^^\ Then 

i) the recursive algorithm either converges to 9* or alternates in the limit between a pair of 
values 9y^ 9 satisfying 

g[9) = h{9) and h{9) = g{9). (2.11) 

ii) // there does not exist 9_^9 such that (12.111) holds, then the recursive algorithm converges 
to the solution of (I2.10p . 

In this subsection, we introduce a simple recursive algorithm. Alternative root-finding algorithms 
such as bisection method, Newton's method, secant method, etc. can also be used to find the 
root in (IXTUD . 

2.3 Approximating optimal tilting probability measure 

In this subsection, we get the optimal tilting probability measure by approximating ip'{9) for a 
moderate deviation event. Let Xi, . . . , X„ be i.i.d. random variables from a distribution function 
F, with mean fi and variance a^. We want to estimate by simulation of the probability 

P{SJn < an), (2.12) 

for some a„, where S'„ = Yl^=i -^i- Note that the probability is small when an — fi < 0, and 
cin — /J' = O {1) . Assume that the moment generating function of Xi exists for some 9 belonging 
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to some interval G which contains the origin, and let ip{6) := log-E'(exp(^Xi)) denote the 
cumulant generating function of Xi. Note that /i = 'ip'iO), = ip"{0). By using the technique 
developed in Section 2.1, we first embed the original probability P in the following exponential 
family 

^(x)=exp(ex-^(^))^(x). (2.13) 
The class of estimators considered for the probability defined in fl2.12p is 

*„ = /(S„/„<a„)n^. (2.14) 

where {Xj} are distributed from Qq. By Theorem 1, we have that if = and a„ = a < 0, the 
optimal point 6* in simulations is given by ip'{0*) = Eq(^q*'^[X\X > a]. Or ip'lO*) = a for large 
deviation tilting. 

The starting point of the approximation is that an — ii = o(l), or that the probability (12.121) is 
not exceedingly small. For instance one may be interested in the probability P{Sn — nfi < aa^/n) 
for a < 0, and this leads to a„ = yU + aa/ y/n. The point 6* can be approximated in the following 
manner by a one-term Taylor expansion of ip'{6*): 

^'(0) + e*4j"{0) ^ Eq^, [X\X > a„] ^ /i + a^e* = Eq^, [X\X > an]. (2.15) 

The calculations above suggest that when an — ^ 0, expansions for (12.131) and the optimal 
choice of parameter 6* can be obtained through the first few moments of F (more specifically, 
through the mean and variance) without appealing to the properties of moment generating 
function. Therefore it is illuminating to consider a slightly different approach to the problem of 
estimating (I2.12p . with a„ = fi + aa/^/n, especially, for heavy-tailed distributions. An alternative 
approach for heavy-tailed distribution without moment generating function is via the method of 
transform random variables, cf. Asmussen and Glynn (2007). 

Example: Pareto distribution. Let f{x) = a{l + x)~"~^ and f{x) = a{l + x)~"~^, where 
a — 0. Here we consider a = a6{a), and 6 (a) is a solution of 

fi + a^e = EQ^[X\X>an]. (2.16) 

Remark 4: Note that in (I2.13p . we apply the idea of exponential embedding for a non-parametric 
distribution F. Further approximation can be done on (I2.14p via the LAN family. This idea had 
been carried out in the bootstrap setting of Fuh and Hu (2004, 2007). More detailed analysis 
along this line and the comparison of non-paremetic importance sampling in Zhang (1996) and 
Neddermeyer (2009), for instance, will be published in a separate paper. 

3 Examples and Simulation Study 
3.1 Examples 

To illustrate the general account of importance sampling, in this subsection, we study three 
examples: normal distribution, noncentral distribution and compound Poisson processes. 
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and report several other interesting distributions. The simulation event is {X > a} for a given 
random variable X and some a > 0. In these examples, we explicitly calculate a closed-form 
formula of Eq[X\X G A] when possible. 



Example 1: Normal distribution 

Let X be a random variable with standard normal distribution, denoted by A^(0, 1), with 

dP 
dC 



probability density function (pdf) 4^ = e ^^/^/a/^tt. Standard calculation gives "^{9) = e^^^^ 



il^{6) = 6*^/2 and tp'{6) = 9. In this case, the tilting measure Q is A^(^, 1), a location shift, and 
Q is N{—9, 1). Applying Theorem [1] and using the fact that > a} is a truncated normal 

distribution with minimum value a under Q, 9* needs to satisfy 

^-i-Ha + 9)-^- ^^-^^ 



Alternatively, G{9) = e {1 — ^{a + 9)), and 9* must satisfy the first-order condition, 2^^(1 — 
$(a + 9)) = (/.(a + 9), or equivalently, ^ = 0(a + 9)/2{l - $(a + 9)). By using ~ ^ as 

X — > oo, it is easy to see from equation (13.1 p that 9* a when a is large. This is the same as 
the large deviation tilting probability. 

We remark the normal distribution has been analysed in Fuh and Hu (2004), for illustration, 
we consider this example from our general account and provide a simple and explicit character- 
ization for 9*, in the sense that the right-hand-side of (13. ip is a straightforward application of 
Theorem [T] 

Example 2: Noncentral ^^{X^k) distribution 

Let Zi be independent, normally distributed random variables with mean /ij and variances af, 
for i = 1, . . . , K. Then the random variable X = Yli=i (f^) distributed according to the 
noncentral distribution. It has two parameters: k, which specifies the number of degrees of 
freedom, and A, the noncentrality parameter, is defined as A = Yli=i (^^) • 
The pdf of X is 

^ = ^e-(^+'^/^f^V^'^\/2_i(v^), a;>0, (3.2) 



dC 2 

where Iv{z) is a modified Bessel function of the first kind given by 

Alternatively, the pdf of X can be written as 

-Jr=2^ ^\ fy^+^^i^)^ ^ ^ 0' (3-3) 

i=0 
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where Yq is a chi-square distribution with q degrees of freedom, denoted by Yq ~ By 
representation (13. 3p . the noncentral distribution is a Poisson weighted mixture of central 
distributions. Suppose that a random variable N has a Poisson distribution with mean A/2, and 
the conditional distribution of X given N = i is x"^ with k + 2i degrees of freedom. Then the 
unconditional distribution of X is noncentral with k degrees of freedom, and noncentrality 
parameter A, or, 

X\{N = i} X^{f^ + '^i), N Pois{\/2). 

Let the likelihood ratio g = e^^"'^^^), where ip{e) = log£'[exp(6'X)]. To derive the tilting 
formula of X, note that *(^) = exp (y^)/(1 - 2^)'^/2, ^(^) = ^zfe - f log(l - 2^), and ^'{6) = 



A+k(1-26>) 
(1-20)2 • 



Therefore, the exponential tilting measure Q becomes 

dQ dQdP e-t(|)^ i ^ 



EC- ^ 1 D I 1 K. + 2Z 1 X , , 
3.4 



dC dPdC ^ i\ rf^)2 2 



00 



E 



e 2(1-29) 



A 



2(l-26»)'' i i£+2 

a; 2 



j=o V 2 /vi-2e/ 



Hence, by (13. 4p X can be characterized hy X\{N = i} (^, ^) with N ~ Pois ( 2(1: 
under Q. By Theorem [1], 6'* is the solution of the following equation 



261) 



= ^ol^l-^ > "1. (3.5) 

where X follows X\{N = z} ~ F (^, with iV ~ Pozs {^imf)) ^^^^^ '5- 

When A = 0, this reduces to x^{k) distribution and the optimal 9* needs to satisfy 

EQ[X\X>a]. (3.6) 



1-29 



When the degree of freedom k equals 2, x^{2) distribution reduces to the exponential distribution 
with mean 2. By the memoryless property, we have an explicit formula for C{9) and hence 9* 
needs to satisfy = a + As a result, we have an explicit solution for 



-1 + VT 



a 



2 



9* = — — ' < 1. (3.7) 

a 

Note that the tilting formula (13. 7p can also be found in Example 1 on p. 22 of L'Ecuyer, Mandjes 
and Tuffin (2009). 

Example 3: Compound Poisson process 

Let Rt = ^n=i be the compound Poisson process, where jump event N{t) is assumed 

to follow a Poisson process with parameter A, and the jump sizes Yn are assumed to follow a 
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lognormal distribution with parameters of location rj and scale 6^. Let A = {Rt > Vp} for a 
given Tp, and compute 

P{Rt>rp) = Y.^^P{Y^Z,>r,\N{t) = n) 

n=0 ' i=l 

^ ^! Vn6^ 

where Zi, i = l,...,n, are i.i.d. normal with mean 77 and variance 5^ and Z = ^'^^^^ 
Denote f{Rt) = Rt — Tp, and let the likelihood ratio 

^ = ^em.)-m^ (3.8) 

where V(^) = logE[exp(^/(i?i))]- 

Here the original measure P under N{t) ~ Pois{Xt) and logY^ ~ ^{1], 6"^) becomes 



d£ n! '^2^ 

and 

n.=l 

n.= l 



00 



^ 2 



Therefore, V'(^) = \t{e^''+^^''^^ - 1) - ^^p, and V^'(e) = Xte^i+'^^^^^rj + 65'^) - r^. Hence, the 
likelihood ratio fl3.8p becomes 



dP 



For a given A^(t) = n. 



As a result, Rt = ^i^^ Z, ~ iV(r/ + ^52^52) g^^^j jyj^^^ _ Pois{Xe'^i+^^^^/H) under Q. By 

Theorem [H ^* needs to satisfy 

Xte^v^e-'^^ + ^5^) - r, = [Rt\Rt > rJ , 

where Rt = Et? with Pf|{Ar(t) = n} ~ iV(r/ - ^^2^52) and N{t) ~ Pois{Xe-^'^+^^^^ 'H) 
under Q. 
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Table 1 reports importance sampling tilting probability for eight distributions: Binomial, Pois- 
son, exponential, normal, x^i Gamman, noncentral x^i ^iid uniform. The case of t distribution 
can be found in Fuh et al. (2011). This table includes the family of tilting probability Q and 
ip'{0). Note that in Table 1, when A = {X > a} for some a > 0, then > a] = a+j^ for 

exponential distribution E{1) and = ^^^(^^g-) — for standard normal distribution. By Theorem 

m we have that Q = Q-e in Examples 1-7. 

Table 1: Summary of distributions and their tilting measures. 

Ex P Q ^lj'{e) 

1 Bin,p) B{n,^;^) ^ 



2 Pois{\) Pois{\e^) Ae^ 

3 iV(0,a2) N{e,a^) Oa^ 



4 E{1) E{-^) ^ 



1-26 



5 x\^^) r(f,^) 

6 r(a,/3) r(a,^) ^ 

7 NCx\n,\) X\{N = ^}^T{^,J^\N^Po^s{^) 

8 f/m/orm(0,l) ex e^^l|o<x<i} 
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Table 2: Relative efficiency for different distributions. 



p 


iV(0,l) 


m 




r(4,io) 


X'(2,10) 


0.0001 


2409.74 


818.53 


603.61 


1282.87 


1529.46 


0.001 


290.90 


109.88 


82.74 


166.00 


192.20 


0.01 


38.06 


16.57 


12.90 


23.74 


26.54 


0.05 


9.98 


4.99 


4.04 


6.76 


7.34 


0.1 


5.77 


3.13 


2.60 


4.10 


4.38 



3.2 Simulation Study 

In this subsection, we present some numerical results on relative efficiency using the method 
outlined in Theorem 1 for estimating tail probabilities oi p = P{X > a} for some a > 0. The 
relative efficiency is defined as the variance ratio between the crude Monte Carlo estimator and 
the importance sampling estimator. 

Table 2 compares the relative efficiency (RE) for various distributions. Here a is chosen 
such that p = 0.1 (central event), 0.05, 0.01 (95%, 99% confidence interval), 0.001 (VaR type 
probability) and 0.0001 (rare event). Monte Carlo sample size is 10, 000 for each case. Note that 
the relative efficiency in normal case is the largest. 

Simulations are conducted for central x^('^)) noncentral x^i^^^^) gamma r(a,/3) dis- 
tributions. Since the results are almost the same, we only report the simulation results for 
noncentral x^i-^^^^) distribution in Table 3. We can see that the proposed method is signifi- 
cantly more efficient than the naive Monte Carlo in 10, 000 simulations for all probabilities of 
p = 0.1, 0.05, 0.01, 0.001 and 0.0001. Furthermore, the efficiency gain is larger for smaller 
probabilities against the naive method. The normal case can be found in Fuh and Hu (2004), 
which will be used, along with its square and compound Poisson model, for VaR computation. 
Normal distribution and distribution will be used for importance resampling in bootstrapping 
confidence region of parameters in regression model. 
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Table 3: We compare the performance, in terms of means and standard errors (SE), of estimating 
the probability P{NCx^{n^ A) > a) using the naive Monte Carlo method (MC) and our impor- 
tance sampling method (IS) for a combination of parameters (k. A) and a variety of boundary 
values a with a sample size of 10,000. p is the theoretical value of the probability of interest. 
9* is the optimal tilting parameter. Both the relative efficiency (RE) and the benchmark RE 
(RE*) are reported, where the former is calculated as the ratio of the variance of the Monte 
Carlo estimate over that of the importance sampling estimate, and the later is calculated as 
p(l-p)/(G(r)-p2). 



(k, X) p a 9* MC IS RE RE* 



mean SE mean SE 



(2, 1) 


0. 


,0001 


24, 


.28 


0, 


,38 


0, 


.000000 


0, 


.000000 


0, 


,000097 


0, 


.000003 


0, 


.00 


1059. 


,49 




0. 


,001 


18, 


.65 


0. 


,36 


0, 


.001200 


0, 


.000346 


0, 
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4 Applications 



To illustrate the applicability of our proposed tilting formula, we present two applications in this 
section. In subsection 4.1, we apply the importance sampling for portfolio VaR computation, to 
which the tilting formula is given under the multivariate jump diffusion model of the underlying 
assets. In subsection 4.2, we study importance resampling for bootstrapping confidence regions 
in regression models. By using the idea of pseudo-maximum likelihood estimator (PMLE) and 
the criterion of minimizing variance of the Monte Carlo estimator under a parametric family, we 
propose a titling formula based on the parametric family of normal (or x^) distributions. 



4.1 Evaluating Value at Risk 



As a standard benchmark for market risk disclosure, VaR is the loss in market value over a 
specified time horizon that will not be exceeded with probability 1 — p. Hence define VaR as 
the quantile Ip of the loss L in portfolio value during a holding period of a given time horizon 
At. To be more specific, we express the portfolio value V{t, S{t)) as a function of risk factors 
and time, where S{t) = {S^^\t), . . . , S^'^\t))'^ comprises the d risk factors to which the portfolio 
is exposed at time t and T denotes the transpose of a matrix. The loss of the portfolio over the 
time interval [t, t + At] is L = V(t, S(t)) — V{t + At, S(t + At)). Therefore VaR, Ip, associated 
with a given probability p and time horizon At, is given by 



P{L > Ip 



p. 



(4.1) 



Assume S{t) follows a c?- dimensional jump diffusion model such that the return processes are 
described by the stochastic differential equations 



dS^'\t) 
S('){t) 



N{t) 



fi^'^dt + a^'^dW^'^it) + ^logi;. 



1,2, 



, d. 



(4.2) 



where {W^^\t),. . . ,W^'^\t)) is a standard Brownian motion in R^, N{t) ~ Pois{Xdt) is a Pois- 
son process, logl^*'*^ are independent and identically distributed (i.i.d.) random variables with 

A^(?7*-'\ (J*-*-*^) distribution. Here the drift parameters volatility parameters a^^\ jump fre- 
quency A, and jump size parameters r]^^\ 5*-*^ are given. Furthermore, we assume that the Brow- 
nian motion and the Poisson process are independent. Denote logY,- ~ N(ri, 6^'Zj6_), where 
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For simple notations, denote 
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a^'^^VAi 


, x = 
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1 Pl2 
Pl2 1 

Pld ■ ■ ■ 

A discrete version of fl4.2p is 



Pld 

P2d 



J 



(1) 



N{At) 



(4.3) 



Next we shall describe a quadratic approximation to the loss L. Let r = (r'^^\ . . . ^r^'^^Y' be 
a vector of return of d assets. Denote AS" = [S{t + At) — S{t)] r as the change in S over 
the corresponding time interval. The delta-gamma methods refine the relationship between risk 
factors and portfolio value by including quadratic as well as linear terms. The delta-gamma 
approximation to the change in portfolio value is 

Ot r -i 

V{t + At, 5 + A^) - V{t, S) ^ —At + 6^ AS + -AS^TAS, 

0b Zi 



where b = . . . , S^f and Y = [Tij]ij=i 

r 



_ dV 



.,d, with 

_ d'V 
" dSidSj ' 



and all derivatives are evaluated at the initial point {t, S). Hence we can approximate the loss 

ao + a^r + r^Air, (4.4) 

where = —^At is a scalar, ai = —6 is an (i- vector and Ai = —T/2 is a symmetric matrix. 
Under the jump diffusion process assumption of r, we have 

L = ao + p + a^aX + J + fi^ Aifi + AiZa 
= bo + aJaX + a^J + aJ'Z'^AiXa, 



(4.5) 



where bo = + ajp + p^A.^^ and J = (Efjf log y}'^ , . . . , EfJf log y}'Y- 

To have a simple approximation, we neglect the quadratic approximation of the jump part 
in f l4.4p as it is very small compared to the return of portfolio. Let C be the square root of the 
positive definite matrix S such that C'^C = S. We can transform the distribution of X into 
Z, which is multivariate normal distributed with identity covariance matrix, so that X = CZ. 
Moreover, C can be chosen so that C'^AiC is diagonalized with diagonal elements Ai, 
Denote := L — bo, then 



af aX + af J + a^X'^A^Xa = a{aCZ + a' Z' C AiCZa + a{J 

d 

b^Z + g^Z^KZa + af J = ^ b^Zj + \j{a^^^f AtZ'j + af J, 



T ryTr^T 



(4.6) 
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where 6^ = a^aC. 

By using the tilting formula developed in Theorem [1] of Section 2, we consider a family of 
alternative distributions. Let the likelihood ratio of the alternative probability measures with 
respect to the target measure P be of the form 

^ ^ ^d(L,-rp)-m ^ (4 7) 

where L{, is defined in (14. 6p . is the pth quantile, and ip{9) = \ogE[exp{9{Lb — rp))] is the 
cumulant generating function of Lf, — under the target probability measure P. The domain 
of 6 will be specified after Equation fl4.12p . The problem of finding the optimal alternative 
measure is then reduced to that of identifying the ^-value that yields the minimal variance for 
the importance sampling estimator. 

We now proceed to find the probability density under the alternative measure Q. A simple 
calculation leads that 



4!{e) = logE{e^^'-^'^) (4. 
= lyf (f^;)' -log(l-2gA,(aW)^At)^ 



Oalri + -e'^al{5^T.j5)ai 



pi 



and 



^ ^ ' (1 - 2^A,(a«)2At)2 ^ 1 - 2^A,(a«)2At 



+AAte - 2 -^-^ (afr/ + ^af(5^Sj5)ai). (4.9) 

From (14.71) . and (14.81) . it follows that the importance sampling is done with an exponential 
tilting measure. Therefore, given A^(At) = n, 



_ iQiP 1 f W (410) 



X 



(27r)'^/2|5 



where 



and 



Qh 

^'^^^ ^ l-2^A,(aW)2At' ""^'^^^ " l-2eA,(aW)2At' ^^'^^^ 
A(^) = AAte - 2 -"-^ , r/(^) = r/ + ^af(5^Ej5). (4.12) 
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That is, in f l4.10p Zj, j = l,...,d, are independent N{fij{9),a'j{9)), and Jj are compound 
Poisson processes with jump frequency \{9) and jump size V{i), i = 1, . . . ,n has mean ri{9) 
and variance matrix 6^Tjj6_. To guarantee the rational of Q under exponential twisting of 
measures, the constant 9 must satisfy 1 — 2^A(i)((T^'^)^At > and 1 — 2^A(d)((j'^*))^At > 0, where 
A(i) = maxi<j<d Xi and A(rf) = mini<i<d Aj. 

To have an efficient importance sampling for approximating P{Lb > Vp} for some Vp > 0, we 
need to characterize the optimal titling 9 via Theorem [H Before stating the result, we define 
some quantities that facilitate the presentation of it. In view of (14.111) . define 

/i,(^)=/i,M), a]{9) = a]{-9), X{9) = X{-9), r^{9)=v{-9). (4.13) 

Let V{j) has a (i-variate normal distribution with mean fj(9) and variance matrix 6^Tij6 and 
N{At) follows a Pois{X{9)). Applying Theorem [H it is straight forward to obtain the following 
corollary. 

Corollary 1 Let 9 be such that 1 ± 29X(^i){a^'^fAt > and 1 ± 29X(^d){(r^^fAt > 0. Under the 
quadratic approximation to a portfolio VaR, the optimal alternative distribution Q minimizing 
the variance of the importance sampling estimator has 9 satisfying 

^\9) = EQ[L,\L,>rp], (4.14) 

where ip'{9) is in \4.9^ , Eq is the expectation under Pq, which has the form (I4.10p but replaced 
by the three joint distributions: N[j2j[9),aj{9)), Pois{X{9)) , and V{j). 

The optimal 9p satisfying (14.141) can be searched by standard numerical methods or the re- 
cursive algorithm presented in Section 2.2. Next we present the importance sampling algorithm 
for multi-variate jump diffusion model as follows: 

1. Compute the 9p such that ip'(9p) = Eg^[Lf,\Lb > rp]. 

2. (i) Generate {Z,, . . . , Z^f ^ {N{fi,{9p),ali9p)), . . . , N{fia{9p),aj{9p))f . 
(n) Generate iV(At) ~ Pms{X{9p)). 

(iii) Given N{At) = n generate 

V{i) = (logr/^\ logr/'\ . . . , logY^'^f ~ N{T^{9p)A^^A), z = l, . . . , n. 

3. Repeat step 2 k times to have L^^i defined in (14. 6 p for i = 1, . . . , /c. Compute p{9p) = 

4. Repeat steps 2 and 3 with Monte Carlo size M, and compute sample variance vafp{9p). 
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In Table 4, we compare the relative efficiency (RE) with fifteen risk factors of the proposed 
method (PSD) with respect to the naive method (NV) in estimating the loss probabilities P{Lh > 
x} with different values of x. Here the relative efficiency, RE(Methodl, Method2), is the variance 
under Method 2 divided by that under Method 1. The parameters are A = 1, rj^^^ = r]^'^^ = . . . = 
^(15) ^ 0^ ^(i) ^ ^/;loO, /i« = i/lOO, a^^ = 0.1 + i/100 and pij = 0.3 for all i ^ j, sample 
size k = 1,000 and Monte Carlo replication M = 10,000. In Table 4, we set h = 0.044, 
62 = 0.0589891, 63 = 0.0720326, 64 = 0.0838734, 65 = 0.0948957, be = 0.105325, 67 = 0.115304, 
bs = 0.124929, 69 = 0.134271, 610 = 0.143378, bu = 0.152289, 612 = 0.161034, 613 = 0.169635, 
614 = 0.178112, 615 = 0.186479, Ai = 5.2 and A2 = . . . = A15 = 0.7 with quadratic approximation 
function for a fifteen-variate jump diffusion model. We can see that the proposed method is 
significantly more efficient than the naive Monte Carlo in 10, 000 simulations for all values of x. 
Furthermore, the efficiency gain is larger for smaller probabilities against the naive method. 

Table 4: Quadratic approximation function compared with naive method and a multi-variate 
jump diffusion model in Monte Carlo simulation. 





0.824 


1.166 


1.549 


P 


0.0500 


0.0100 


0.00100 


NVp 


0.0501 


0.00998 


0.00099 




4.80E-05 


l.OOE-05 


9.85E-07 


PSD p{9) 


0.0501 


0.00999 


0.000100 




5.12E-06 


2.69E-07 


3.65E-09 


mm,p) 


9.36 


37.15 


269.79 



p denotes the true tail probability, denotes the quantile of p, p and voTp are the mean and variance of 
the probability estimator with Monte Carlo, p{9) and varp(9) are the mean and the variance of the tail 
probability estimator with importance sampling, IlE{p{9),p) is the relative efficiency p{6) relative to 
p in a multi-variate jump diffusion model. 

4.2 Bootstrapping confidence regions in regression model 

Consider a regression model 

p 

Yi = '^Xij(3j + Ei for i = l,2, ...,n, (4.15) 

i=i 

where p > 2, ei, . . . , e„ are i.i.d. mean zero random variables with distribution F{-). Denote 
as the variance of ei. Let Xi = {xn, . . . , Xip)'^, (3 = (/3i, . . . , /3p)'^. In vector and matrix notation, 
we write Yi = xf /3 + Si for i = 1, 2, . . . , n, and 

Y = X/3 + e, (4.16) 
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where X = {xij)nxp, Y = (Fi, . . . , Yn)'^ and e = {ei, . . . , Sn)'^ ■ Then the least-squares estimator 
of /3 is 

/3 = (X^X)^^X^Y. (4.17) 

Consider the problem of finding a confidence region C in that covers the vector /3 with 
prescribed probabihty (1 — a). Denote f3o as the true parameter. Under the assumption of X is 
full rank p, the statistics of interest is 

^ ^ (/3-/3of(X^X)(/3-/3o) ^ 
pa'^ 

where is the sample variance. In particular we would like to estimate the probability of 
the event {T G A}, where A is chosen to be a circular confidence region for the unknown 
parameter /3o. Let T* = — /3)"^(X^X)(/3* — (3) /pa* , where cr* are the bootstrap 
estimators of $ and a^, respectively, for given data. Then the bootstrap estimator of P{T e ^1} 
is M = P{T* e A\e} = E{l^j,,^j^y\i), where i = {ii, . . . 

To illustrate our proposed method, we select a sample from a regression model in Longley 
(1967) in which the data set is available at 

|http7'/www. itl.nist.gov/div898/strd/general/dataarchive. html We choose this classical data 
set of labour statistics because it is one of the first used to test the accuracy of least squares 
computations. It is noted that the same algorithm can be applied to current statistical data set. 

The response variable {y) is the Total Derived Employment and the predictor variables 
are GNP Implicit Price Deflator with Year 1954 = 100 (xi). Gross National Product (^2), 
Unemployment (X3), Size of Armed Forces (X4), Non-Institutional Population Age 14 & Over 
(xs), and Year (xe). 

yi = Po + PlXii+ P2Xi2+ + PiXii + Pr:,Xir^+ PQXifs+ Ei i = 1,2, . . . ,77, (4.19) 

where the Si is assumed by normal distribution with mean zero and variance o"^ under PMLE. 
The least squares estimator (MLE) of /3 is 

(3 = 00, ...,(3q) = (-3482258.63, 15.0618, -0.0358, -2.0202, -1.0332, -0.0511, 1829.15). 

The sample variance is 55761.60. The bootstrap estimator of P{T > a} is d = P{T* > 
a\e} = E(l|y.>„||e), where i = (ii, . . .,£„). 
Let 

^ 0* - ^nX^X)0* - ^) ^ ELi^' 
pa* pa* 
where e* follows normal distribution with mean zero and variance a"^. Here p = 6 in equation 

To apply importance resampling in the setting, note that the original bootstrap distribution 

is 

rfP jj 1 _ff 



dC v27r(j 



-e 2*-: 
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The likelihood ratio based on an exponential change of measure is 



^ _ MT*-a)-Md) - ^^^^ (A 20) 

where il^a{G) = log Ee^^^*~^\ Let Ta = T* — a. The moment generating function of is 
E[e^'^°-] = e~^"(l — 2-)~2. Therefore, the change of measure Q is 



dQ dQ dP 0T*/^ r^^\-2LTT 1 TT 1 ^T~71 

- - e*"^ (1 - 2-) 2 — e = ^ — ^ — e 



dC dPdC p' ^\^ph[o ^\\f^ 

1=1 * 1=1 * 



Recall that = Q-e- Then, the optimal titling point of our proposed method can be 
obtained by solving 

Eq^[T*\T* >a]= ij'M = -« + log(l - 2 (4.21) 

Instead of doing simulation under the normal family, we propose a transformed likelihood 
ratio method to obtain the bootstrap estimator of P{T > a}. The method is doing simulation 

under the x^-distribution family. Note that T* = Yll^=i^i Ip^ where Xi = e* /a* is a x^- 
distribution with degree of freedom 1. Here the likelihood ratio based on an exponential change 
of measure is 

dP~ - Ee'(^*-y ^ ' 

where il)a{0) = log^e'^C^*""). By using an argument similar as above, the optimal titling point 
can be obtained by solving 

n 

Eq^[T*\T* >a]= ^'^{6) = -a + log(l - 2-)-^. (4.23) 

Note that equals flOT]) . 

Since the relative efficiencies of these two algorithms are almost the same, we only report that 
based on normal family. Table 5 reports the relative efficiency of the importance resampling with 
respect the naive Monte Carlo simulation. Here a denotes the true tail probability, a denotes 
the quantile of T*, a and vara are the mean and variance of the probability estimator with 
Monte Carlo, a{6) and vara{0) are the mean and the variance of the tail probability estimator 
with importance sampling, KE[a{9), a) is the relative efficiency of 6 relative to d in a regression 
model. Table 6 reports the non-coverage probabilities and averages and standard deviations of 
region areas for cubical and spherical confidence regions with nominal coverage probability 95%. 
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Table 5: Importance sampling with PMLE compared with naive method and a regression model 
in Monte Carlo simulation. 
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4.51 
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26.59 



Table 6: Non-coverage probabilities and averages and standard deviations of region areas for 
circular confidence regions with nominal coverage probability 95%. The sample is drawn from 
a regression model in Longley (1967). The parameter of interest is /3. Results are reported for 
studentised statistics and naive resampling and importance resampling. 
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Noncoverage 




Standard 


Method 


size 


probability 


Average 


deviation 


Naive 


1000 


0.054 


49214.4 


4853.05 


Importance 


400 


0.049 


51054.2 


5035.12 


Importance 


200 


0.051 


52563.7 


5174.63 


Importance 


100 


0.053 


54853.3 


5313.38 



5 Conclusions 

In this paper, we propose a general account in importance sampling with applications to portfolio 
VaR computation and bootstrapping confidence regions. It is shown that our method produces 
efficient approximation to the problem. Simulation results confirm the theoretical results that 
our method always provides greater variance reduction than the naive Monte Carlo method. 
Our numerical experiments demonstrate that the gain in variance reduction can be substantial 
in some cases. 

The key features of our method are twofold. First, (12.81) characterizes the optimal alternative 
distribution for importance sampling under exponential tilting. And the recursive algorithm 
facilitates the computation of the optimal solution. The initial value of the recursive algorithm 
is the dominating point of the large deviations tilting probability used previously by other 
authors; e.g., Sadowsky and Bucklew (1900). The recursive algorithm then sequentially generates 
alternative distributions providing greater variance reduction. Due to the nature of the recursive 
algorithm, the additional programming effort and computing time are negligible. Second, the 
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proposed tilting formula for normal distribution and its square, along with jump diffusion model, 
can be used to have more efficient simulation for portfolio VaR computation. We also apply the 
proposed titling probability to a parametric family (normal family or family) which can be 
used for constructing bootstrap confidence regions for the unknown parameters in regression 
models. 

Our method highlights the two aforementioned key features in general settings. By using 
the idea of conjugate probability measure, we obtained the optimal tilting parameter via (12. 8p . 
Specific considerations can be found in several papers; see Fuh and Hu (2004) for multivariate 
normal distributions, Fuh and Hu (2007) for hidden Markov models, and Fuh et al. (2011) for 
multivariate t distribution. Further applications to i^'-distributions and copula models will be 
published elsewhere. In this paper, we assume that the underlying random variables are indepen- 
dent over time. A more challenging project is to model the time dependence using, for example, 
Markov switching autoregression models or GARCH models. It is expected that our method 
can be applied to option pricing, Greeks letters calculation, and correlated default probabilities 
among others. 



Appendix: Proof of Theorem [T] 



The existence of the optimization problem (12. 3p can be proved by first showing that for 
^ G R, 

ip'{6) is strictly increasing and Eq[X\X G A] is strictly decreasing. (A.l) 

To prove f lA.l|) . we first note that ip{6) is the cumulant generating function of X, and therefore 
its second derivative ip"{0) > for 6 E I. This implies that ip'i^^) is strictly increasing. Since 
"if (9) is convex and steep by assumption, we have ip'{0) — )■ oo as 6* — )• O^a.^- Furthermore, consider 
the conditional measure of Q on the set A, denoted by Q^, which is defined as 

dQ. = (A.2) 

Then we have 

dEQ^[X\XEA] _ d ( E[l^x<,A}Xe''''] 



d9 dd V E[l{xeA}e-^^ 



E[^xeA}X'e-'^] E'[^xeA}Xe-<^''] 

+ -E^-. = -varoJX) < 0. 



This implies that Eq^ [X\X E A\is strictly decreasing. The existence of the optimization problem 
O follows from Eq^[X\X E A] = E[X\X eA]>^i = ^'{Q). 

To prove the uniqueness, we note that the second derivative of G equals 

= ^^[l{xeA}(+^'(^^))e-^^+'^(^)] 

= E[l{X6A}((-X + ^'(^^))2 + <(^))e-^^+'^(^)]. (A.3) 
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Since ip{6) is the cumulant generating function of X, its second derivative ip"{0) > 0. It then 
follows from (]A.3|) that ^ qq^' > 0, which implies that there exists a unique minimum of G{9). 



(A.4) 



To prove (12. 8p . we need to simplify the RHS of (12. 5p under Q. Standard algebra gives 

EmxeA}e-'>^] f l^,^A}e-<^'^dP/^{e) J l{.eA}dQ ' 
As a result, flA.4p equals 

j l{.eA}xdQA = EqJX] = Eq[X\X e A], 
which implies the desired result. 
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